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MULTIVARIATE SPACINGS BASED ON DATA DEPTH: I. 
CONSTRUCTION OF NONPARAMETRIC MULTIVARIATE 
TOLERANCE REGIONS^ 

By Jun Li and Regina Y. Liu 

University of California at Riverside and Rutgers University 

This paper introduces and studies multivariate spacings. The 
spacings are developed using the order statistics derived from data 
depth. Specifically, the spacing between two consecutive order statis- 
tics is the region which bridges the two order statistics, in the sense 
that the region contains all the points whose depth values fall between 
the depth values of the two consecutive order statistics. These multi- 
variate spacings can be viewed as a data-driven realization of the so- 
called "statistically equivalent blocks." These spacings assume a form 
of center-outward layers of "shells" ("rings" in the two-dimensional 
case), where the shapes of the shells follow closely the underlying 
probabilistic geometry. The properties and applications of these spac- 
ings are studied. In particular, the spacings are used to construct 
tolerance regions. The construction of tolerance regions is nonpara- 
metric and completely data driven, and the resulting tolerance region 
reflects the true geometry of the underlying distribution. This is dif- 
ferent from most existing approaches which require that the shape of 
the tolerance region be specified in advance. The proposed tolerance 
regions are shown to meet the prescribed specifications, in terms of 
P- content and fl- expectation. They are also asymptotically minimal 
under elliptical distributions. Finally, a simulation and comparison 
study on the proposed tolerance regions is presented. 

1. Introduction. The term "spacings" in statistics generally refers to ei- 
ther the intervals (or gaps) between two consecutive order statistics or the 
lengths of these intervals. Spacings have been used extensively in probabil- 
ity and statistics, especially in the areas of distributional characterization, 
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extreme value theory and nonparametric inference. There is a rich literature 
on the theory and applications of spacings. The excellent treatise by Pyke in 
[22] as well as the references therein (e.g., [5] and [27]) and thereafter (e.g., 
[2, 4, 9, 12, 28]) all attest to the importance of spacings. In his paper [22] 
Pyke wrote. 

Perhaps the most significant restrictions of this paper has been our concern with 
one- dimensional spacings. There are many applications in which samples are drawn 
from two- or even three-dimensional space and for which it is important to study the 
spacings of the observations. 

Although research on spacings has continued, his call for multivariate spac- 
ings has remained largely unanswered. The main difficulty in generalizing the 
univariate spacings to multivariate settings is the lack of suitable ordering 
schemes for multivariate observations. This paper has two goals. First, we 
introduce multivariate spacings using the multivariate ordering derived from 
the notion of data depth. Second, as an application, we apply the proposed 
multivariate spacings to construct nonparametric tolerance regions. 

The paper is organized as follows. Section 2 is devoted to the develop- 
ment of multivariate spacings. We begin with a brief review of the univariate 
spacings and of some of their properties, as well as a brief description of the 
subject of data depth and the corresponding depth ordering of multivariate 
data. Note that the depth ordering is from the center-outward rather than 
the usual univariate linear ordering from the smallest to the largest. For any 
two consecutive depth order statistics, we define the spacing between them 
as the region that contains all the points in the sample space whose depth 
values fall between the depth values of the two order statistics. The multi- 
variate spacings are the collection of these regions formed by all pairs of con- 
secutive order statistics. These regions generally appear as center-outward 
layers of "shells" ("rings" in 3f?^), and the shapes of the shells follow closely 
the probabilistic geometry of the underlying distribution. In Section 3 we 
first provide a review of tolerance intervals for univariate data as well as the 
existing approaches for obtaining multivariate tolerance regions. We then 
describe the construction of nonparametric tolerance regions using the pro- 
posed multivariate spacings, and investigate the properties of the proposed 
tolerance regions. Specifically, we show that these tolerance regions: (1) meet 
the prescribed specifications in terms of (3- content and P- expectation, and 
(2) are asymptotically minimal under a certain class of distributions which 
includes the elliptical family. The formation of our tolerance region is com- 
pletely data driven and nonparametric, and the resulting tolerance region 
has the desirable property of reflecting accurately the underlying probabilis- 
tic geometry. In other words, the shape of our proposed tolerance regions 
is automatically determined by the given data, and does not need to be 
specified in advance. Most existing approaches require pre-specification of 



MULTIVARIATE SPACINGS AND TOLERANCE REGION 



3 



the shape, which can be considered arbitrary or subjective. It is also worth 
noting that our tolerance region is always connected, which is more suitable 
in applications such as quality control. Section 4 contains a simulation study 
and some comparisons with other existing tolerance regions. It confirms sev- 
eral desirable features of our approach. Section 5 contains some concluding 
remarks. Most technical proofs are collected in the Appendix. 

We also observe in Section 3.1 that in using our multivariate spacings to 
construct tolerance regions, we have in effect argued that our multivariate 
spacings are an ideal realization of the so-called "statistically equivalent 
blocks." This is because the realization of our multivariate spacings and 
their shapes are entirely data driven. Statistically equivalent blocks had 
been considered by Tukey in [24] and several follow-up papers (see, e.g., 
[10]) as possible building blocks for the construction for tolerance regions or 
tools for characterizing distributions. However, these papers again all need 
to pre-specify the shapes (e.g. rectangles or circles in 3f?^) of the blocks. 

2. Multivariate spacings derived from data depth. We begin with a brief 
review of the notion of data depth and its corresponding multivariate order- 
ing. This multivariate ordering naturally leads to our multivariate spacings. 

2.1. Data depth and center- outward ordering of multivariate data. A 
data depth is a measure of "depth" of a given point with respect to a multi- 
variate data cloud or its underlying distribution, and it gives rise to a natural 
center- outward ordering of the points in a multivariate sample. Although the 
actual depth value has been used widely to develop robust multivariate infer- 
ence, the depth-ordering is less understood and still underutilized. Existing 
notions of data depth include: Mahalanobis depth ([20]), half-space depth 
([14, 25]), simplicial depth ([16]), projection depth ([7, 8, 23, 30]), etc. More 
discussion on different notions of data depth can be found in [17, 31]. 

To help facilitate the coming exposition of multivariate spacings, we use 
the simplicial depth to illustrate the general concept of data depth and 
its corresponding center-outward ordering. Let {Xi, . . . , Xn} be a random 
sample from the distribution F{-) £ p > 2. Consider the bivariate setting, 
p = 2. Let A(a, 6, c) denote the triangle with vertices a, b and c. Let /(•) be 
the indicator function, that is I (A) = 1 (or 0) if A occurs (or not). For the 
given sample {Xi, . . . ,Xn}, the sample simplicial depth of x is defined as 



which is the fraction of the triangles generated from the sample that contain 
the point x. Here (*) runs over all possible triplets of {Xi, . . . , Xn}- A larger 
value of Df„{x) indicates that x falls in more triangles generated from the 
sample, and thus lies deeper within the data cloud. 



(2.1) 




(*) 
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The above can be generalized to dimension p by counting simplices rather 
than triangles, that is 

(2.2) DfAx) = (^pI^) 

{*) 

where (*) runs over all possible subsets of {Xi, . . . , Xn} of size (p + l). Here 
s[Xi-^ , . . . , J is the closed simplex whose vertices are {Xi^ , . . . , Xi^^-^}. 

If F is given, the simplicial depth of x w.r.t. to F is defined as Dp{x) = 
Pf{x G s[Xi, . . . , Xp+i]}, where Xi,. . . , Xp+i are {p + 1) random observa- 
tions from F. Dp(x) measures how "deep" x is w.r.t. F, and Dp^{x) in (2.2) 
is a sample estimate of Df{x). A fuller motivation together with the key 
properties of data depth can be found in [16] . In particular, it is shown that 
Df{-) is affine invariant, and that Dp^[-) converges uniformly and strongly 
to Di?(-). The affine invariance ensures that our proposed spacings and infer- 
ence methods are coordinate free, and the convergence of Dp^ to Dp allows 
us to approximate Dp{-) by Dp^{-) if F is unknown. 

For the given sample {Xi, X2, . . . , we calculate the depth values 
Dp^{Xiys and then order the XiS according to their descending depth 
value. Denoting by the sample point associated with the jth largest 
depth value, we then obtain the sequence {X[i],X[2], • ■ . ,-^[n]} which is the 
depth order statistics of Xj's, with being the deepest point, and 
most outlying. Here, a larger order is associated with a more outlying posi- 
tion w.r.t. the underlying distribution. Note that the order statistics derived 
from depth are different from the usual order statistics in the univariate case. 




Fig. 1. Depth contours for: (a) bivariate normal sample; (b) bivariate exponential sam- 
ple. 
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since the latter are ordered from the smahest sample point to the largest, 
while the former is from the middle sample point and moves outward in 
all directions. Figure 1 helps demonstrate this feature of the depth ordering. 
The two plots show two random samples, each of size 500, drawn respectively 
from the standard bivariate normal and bivariate exponential distributions. 
For each plot, the "+" marks the deepest point, and the most inner convex 
hull encloses the deepest 20% of the sample points. The convex hull expands 
outward to enclose the next deepest 20% by each expansion. Those convex 
hulls determined by the decreasing depth value are nested, a feature indicat- 
ing that the depth ordering is from the center outward. Note that the shape 
of the depth contours in those plots clearly reflects the underlying probabilis- 
tic geometry, relatively spherical in the normal case and fanning upper-right 
triangularly in the exponential case. The nested shells-like depth contours 
in Figure 1 also help illustrate the features of the multivariate spacings in 
Section 2.3. 

We give the definition of Mahalanobis depth here, since it is also used in 
the simulation study later in Section 4. 

Definition 2.1. The Mahalanobis depth ([20]) at x with respect to F 
is defined as 

rnDpix) = [1 + (x - ^f)^f^{x - /Uf)']~\ 

where and Tip are the mean vector and dispersion matrix of F, respec- 
tively. The sample version of the Mahalanobis depth is obtained by replacing 
and Tip with their sample estimates. 

Different notions of depth are capable of capturing different aspects of the 
probabilistic geometry, and may lead to different ordering schemes. However, 
all the depth orderings are essentially from the center outward. We note that 
all the depths aforementioned are affine invariant, and so are their resulting 
orderings. The affine invariance is a desirable feature for the construction of 
multivariate spacings later in Section 2.3. 

Note that geometric depths such as the half-space and the simplicial 
depths are completely nonparametric and moment-free, and they capture 
well the underlying probabilistic geometry of the data. Although the Maha- 
lanobis depth captures less well the underlying geometry unless the geometry 
happens to be elliptical, it is computationally more feasible than geomet- 
ric depths. Under elliptical distributions, the two geometric depths capture 
fairly well the elliptical structure in the large sample case and are close com- 
petitors to the Mahalanobis depth. Between the two geometric depths, the 
simplicial depth provides a finer ordering and produces less ties than the 
half-space depth. This point has been observed in [18]. 
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For convenience, we will use the notation D{-) to express any valid notion 
of depth, unless a particular notion is to be emphasized. 

Before we use depth order statistics to formulate multivariate spacings, 
we review the univariate spacings and some of their properties. 

2.2. Univariate spacings. Let Xi,X2, ■ ■ ■ ,Xn be a random sample from 
a univariate continuous distribution F which has the support (a, 6). Denote 
by X[2] , . . . , the order statistics of Xj's, namely X^ij < -'^p] 1^ 

■ ■ ■ < . Note that we will avoid introducing additional messy notation for 
differentiating the univariate setting from the multivariate one by using the 
same notation Xy]^ throughout the paper to indicate the j'th order statistic 
of the sample Xi^s. It will generally be clear from the context whether the 
notation is intended for the univariate or for the multivariate setting. If 
needed, the phrase "univariate ordering" or "depth ordering" will be used 
to emphasize the univariate or the multivariate ordering. 

Given the order statistics X^ij < X[2] < • • • < ^[n] > the univariate spacings 
of the sample refer to the intervals Li = i = 1, . . . ,n + 1, with 

X[o] = a and ^[n+i] = b, or their lengths Di = Xjjj — For convenience, 

we proceed to discuss the spacings by assuming that F follows the uniform 
distribution on (0,1) [denoted by F ~ ?7(0,1)], since the probability inte- 
gral transformation F{X) transforms the given sample into a sample from 



Thus the density function / is completely symmetrical in its arguments. 

[21] and [22] have observed that the uniform spacings {Di,D2, ■ ■ ■ ,Dn+i) 
can be viewed as exponential random variables proportional to their sum. 
Specifically, assume that {Ui,U2, ■ ■ ■ ,Un+i} is a random sample from the 
exponential distribution with mean 1 [denoted as Exp{l)], and let 

S = Ui + U2 + --- + Un+i and Wi = Ui/S, i = l,...,n + l. 

Then, (VFi, W2, ■ ■ ■ , Wn+i) and {Di,D2, ■ ■ ■ , Dn+i) are identically distributed. 

A similar property will appear during our formulation of multivariate 
spacings later. 

2.3. Multivariate spacings. The main difficulty in extending the univari- 
ate spacings to higher dimensions lies in the lack of proper ordering of the 
multivariate data. Applying the center-outward ordering induced from data 
depth, the multivariate spacings can be defined as follows. 



;7(0,1). If F~C/(0,1), then: 

(i) Di+D2 + --- + Dn+i = 1, and 

(ii) the density function of {Di,D2, ■ . . ,Dn+i) is 




if > and di + d2 + ■ ■ ■ + dn+i = 1 
otherwise. 
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Let be a random sample from a continuous distribution F 

in 3^^', p>2. For a given data depth D{-), we calculate all £)i?(Xj)'s, and 
obtain the depth order statistics X^i] , in descending depth values. 

Let Zi = DpiXi), and = DpiXii]), for i = 1, . . . , n. Note that Zl^l > • • • > 
ZN, which are reverse univariate order statistics of Zj's. The matching 
indices in Z^^'^ > ••• > Zt'"] and , . . . , are useful for tracking depth 
order statistics with their depth values in defining multivariate spacings and 
tolerance regions later. We now define the multivariate spacings as follows, 

(2.3) MSi = {X:Z^'~^^>DF{X)>Z^^}, i = l,...,n + l, 

with ^[0] = sup^{Df{x)} and = 0. The corresponding sample multi- 

variate spacings are 

MS^ = {X■.Z^'~^^>DF„{X)>Z^^}, i = l,...,n, and 

(2.4) 

M5„+i = {X:Di.,,(X)<zN}, 

where = sup^ {Df„(x)}, and ZW > • • • > ZN are the reverse order statis- 
tics of Zi = Dp^iXi), i = l,...,n. 

Note that the multivariate spacings here define the "gap" between two 
consecutive depth order statistics as the shell-shape region bridging the two 
order statistics, generalizing the interval linking the two consecutive order 
statistics in the univariate spacings. Consequently, the multivariate spacings 
derived from depth order statistics are center-outward layers of "shells." 
Figure 2 illustrates an example of multivariate spacings determined by a 
random sample of size five drawn from the bivariate normal distribution with 
mean (0, 0) and covariance matrix J) . The five data points are denoted 
by circles in the plot. The Mahalanobis depth is used to calculate depth 
values. The multivariate spacings include six regions, five center-outward 
layered shells and the outmost region. Note that the shells clearly reflect 
the elliptical shape of the underlying distribution. Plots of the multivariate 
spacings for the standard bivariate normal and exponential samples using the 
simplicial depth show layered shells with shapes similar to those of Figure 1. 
Again, the shape of shells reflects the underlying geometric features. 

Next, we observe a useful property regarding the coverage probabilities 
of the proposed multivariate spacings. 

Theorem 2.1. Let Xi, . . . ,X„ be a random sample from F G W. As- 
sume that the notion of data depth used in deriving the multivariate spacings 
(2.3) is affine invariant. Then, the coverage probabilities of these multivari- 
ate spacings, namely {Pf{MSi), . . . , PpiMSn+i)} , follow the same distri- 
bution as the univariate spacings {Di, . . . , Dn^i}. 
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Proof. Let Zi = DpiXi) and Tj = Pf{X : Df{X) > Z^), for i = 1, . . . , n. 
Then Tj's can be considered as a random sample drawn from C/[0, 1], as 
seen in [19]. Let T[i] < ••• < T[„] be the order statistics of Tj's. It is clear 
that T[i] = Pf{X:Df{X) > Z^^). Therefore PF{MSi) = Tyi\ - r[i_i], where 
Tjo] = and T^n+i] = &iid thus the theorem follows. □ 

3. Tolerance region based on multivariate spacings. A confidence inter- 
val is used to provide an interval estimate for a parameter of interest with 
a stated confidence level. In production processes or quality control, it is 
customary to seek an interval that covers a certain proportion of the pro- 
cess distribution with a stated confidence as an assurance for meeting the 
required product specification. Intervals which fulfill this need are called 
tolerance intervals. In many practical situations, the quality of a product is 
specified by multiple characteristics of the product. To ensure the specifica- 
tions of those multiple characteristics simultaneously, multivariate tolerance 
regions are needed. Tolerance intervals and regions are integral parts of ap- 
plications in reliability theory and quality control. They allow the control 
of intended proportions of productions to meet the specified requirements. 
A high percentage of the production outside this interval (or region) will 
result in a high loss or rework rate. Before we describe our proposed con- 
struction of tolerance regions, we briefly review the literature of tolerance 
intervals and regions. 

If the underlying process distribution is known, from either the design 
of experiment or the knowledge gained over long experience, the tolerance 
intervals or regions usually can be established. For example, if the sample is 



"- 0.9 




Fig. 2. Multivariate spacings for a bivariate normal sample. 
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drawn from N{fi, a), a normal distribution with the known mean fj, and vari- 
ance o"^, and if we define tolerance intervals as those which contains 100/3% 
of the underlying distribution, then the shortest tolerance interval is simply 
(// — /;(i_^)/2f)/^ + ^(i-^)/2c]- Here Z(i_^)/2 is the upper (1 — /3)/2th quantile 
of the standard normal distribution. The constant /? is referred to as the tol- 
erance level. This development can be extended in a straightforward manner 
to the setting of a p-dimensional normal distribution with mean vector fi and 
covariance matrix S [denoted as N(/Lt,S)]. In this case, the corresponding 
smallest tolerance region can be constructed as an ellipsoid. It follows the 
elliptical level sets of the underlying multivariate normal distribution and 
satisfies 

Q^,s(i) = {X:{X- fif^-\X -fi)< t}, 
where t is the solution of the equation 
r r 

j ■■■ j ^ V2^e-^^l'^>\P-^\{s\xiP^'ei^idrdei---d9p-i=(i. 

If the distribution or its parameters are unknown, the following two defi- 
nitions of tolerance regions have been considered and accepted as standard 
definitions, see [11] for example. Again, let Xi, . . . ,X„ be a random sample 
from Fg p> 1. 



Definition 3.1. r(Xi,...,X„) is called a (5-content tolerance interval 
(or region) at confidence level 7 if 

(3.1) P{PF{T{Xi,...,Xn))>f3)=^. 



Definition 3.2. The region T(Xi, . . . , X„) is called a (3- expectation tol- 
erance interval (or region) if 

(3.2) E{PF{T{Xi,...,Xn))) = f3. 



In the univariate case, if the normality assumption holds but the param- 
eters are unknown, a tolerance interval can be constructed by 

(3.3) {X -cS,X + cS], 

where X and S are respectively the sample mean and standard. If Definition 
3.1 is followed, [15] shows that c can be approximated by 



Here X7,n-i is the (1 — 7)th quantile of the chi-square distribution with de- 
gree of freedom (n — 1). When the normality assumption is uncertain, Wilks 
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(in [29]) proposed to use the order statistics, X^ij < • • • < X^^j, to construct 
the following nonparametric tolerance interval: 

(3.4) ^'(-'^1, • • ■,Xn) = {X[j.],X[n-r+l]], 

where r is a positive integer and r < (n + l)/2. It has been shown that the 
coverage probability of this tolerance region, namely Pp((X[r],X[„_^_|_i]]), 
follows a Beta distribution with parameters (n — 2r + 1) and 2r, denoted as 
Beta{n — 2r + 1, 2r). Based on this observation, r can be chosen to satisfy 

(3.5) P(5eta(n- 2r + l,2r) >/3) =7 
or 

(3.6) E{Beta{n-2r + l,2r)) = (3 

to meet the requirement in Definitions 3.1 or 3.2. Note that the tolerance 
interval in (3.4) is "symmetric" around the observed center point in the sense 
that the interval excludes an equal number of sample points from both tails. 
Wald in [26] considered a generalization of this symmetric tolerance interval, 
namely (X[g],X[^]], where 1 < s <t <n. Clearly, this includes Wilks' interval 
as a special case, if s = r and t = n — r + 1. Since the coverage probability 
of (X[5],X[j]] can be shown to follow Beta{t — s,n — t + s + 1), the desired 
tolerance interval for Definitions 3.1. or 3.2 can be obtained by choosing s 
and t as the solutions of 

(3.7) P{Beta{t-s,n-t + s + l)>(3) = -f 
or 

(3.8) E{Beta{t-s,n-t + s + l))=p. 

Note that the solution for (3.7) or (3.8) may not be unique. Different appli- 
cations may impose different additional desirable properties and thus con- 
straints on the choice of s and t. One intuitively appealing and desirable 
property is that the tolerance interval (or region) be minimal. 

To achieve the minimal nonparametric tolerance interval, Charterjee and 
Patra in [3] proposed a large-sample approach based on nonparametric den- 
sity estimation, which yields asymptotically minimal tolerance intervals. The 
performance of this approach depends heavily on the methods used for den- 
sity estimation and smoothing. Moreover, this approach tends to be overly 
conservative, as observed in [6]. When the underlying distribution is multi- 
modal, the tolerance interval obtained by this approach may be the union 
of disjoint intervals, which is not desirable in practice. 

In the multivariate case, when F is unknown, there have been efforts to 
develop nonparametric multivariate tolerance regions. For example, Wald 
in [26] extended Wilks' approach for constructing the tolerance intervals in 
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the univariate case to the multivariate case by sequentially adapting it for 
each coordinate. Under this method, the shape of the resulting tolerance 
region would be limited to the hyperrectangles (or rectangular blocks) with 
faces parallel to the coordinate hyperplanes. Tukey in [24] generalized Wald's 
approach to any desired shape by introducing the concept of "statistically 
equivalent blocks." However, the construction of the statistical equivalent 
blocks here requires choosing a priori an ordering function and thus can be 
somewhat arbitrary. Moreover, the shape of the constructed tolerance region 
based on this predetermined ordering function may be difficult to interpret 
or implement in practice. More discussion on statistically equivalent blocks 
is given later in Remark 3.1. 

Chatterjee and Patra's approach for constructing asymptotically minimal 
tolerance intervals based on nonparametric density estimation is also appli- 
cable to the multivariate case, although it has the same drawbacks mentioned 
in the univariate case. Recently, using empirical process theory, Di Bucchi- 
anico, Einmahl and Mushkudiani [6] succeeded in developing an important 
new method for constructing the smallest nonparametric multivariate toler- 
ance regions. Although this method possesses several desirable properties, 
it still has the following potential drawbacks: (i) it requires pre-specifying 
the shape of the tolerance region; (ii) the obtained tolerance region may not 
represent well the underlying geometry of the data; and (iii) the obtained 
region may not be connected. Finally, the computation involved in finding 
this smallest tolerance region can be quite intensive. 

The above review of nonparametric multivariate tolerance regions shows 
that almost all existing approaches require specifying in advance the shape 
of the region. Most shapes specified, such as rectangular or elliptical, seem 
arbitrary and chosen mainly for mathematical convenience. If the shape is 
not chosen properly, these approaches may lead to gross misrepresentation 
of the underlying geometry of the data. 

Recall that the nonparametric tolerance interval proposed in the Wilks 
approach [29] has the form T{Xi, . . . ,X„) = (X[^], Xj^.^.,.]^]]. From the point 
of view of spacings, the Wilks' tolerance interval can be easily seen as the 
union of some suitable number of the univariate spacings, 

n—r+l 

(3.9) T{Xi,...,Xn) = {X[j.],X^n~r+l]]= IJ 

i=r+l 

Similarly, the proposed multivariate spacings derived in Section 2.3 can be 
used to form tolerance regions in multivariate settings. We now give details 
on such constructions, and discuss their properties. 

3.1. Properties of tolerance regions: F is known. Consider the case where 
F is known, p>2. Recall that ^[i], . . . ,^[n] denote the depth order 
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statistics of the sample Xj's and that Z^^l, . . . , Z^"! are their corresponding 
depth values. Recall also that Z-i = DpiXi) and Zl^] > • • • > Z^. Then we 
propose to form the tolerance region as the union of a suitable number of 
the inner spacings, which can be expressed as follows: 

(3.10) Ozir^i = \jMSi = {X:Df{X) > Zt'^"]}, 

i=l 

for a suitably chosen Here MSi, is the ith spacing, as defined in (2.3). 

Applying Theorem 2.1, the distribution of the coverage probability of 
the above tolerance region can be determined immediately, as shown in the 
following theorem. 

Theorem 3.1. The distribution of PpiO^ym]), the coverage probability 

of the tolerance TG-QlOTi dGfiflcd ifl (3.10)j follows Bctcii^T'yi^ 71 -\- 1 — f^n) • 

Proof. Clearly, 0^[,.„] = [jl=iMSi. It follows from Theorem 2.1 that 
J2l=i PpiMSi) and J2l=i identically distributed. Here Di, i = 1, . . . , n + 

1, are the uniform spacings. Recalling the construction of the uniform spac- 
ings using exponential random variables given in Section 2.2, we see then 

r„ r„ 71+1 

PpiOzir^]), ^A, and Y.U^/Y.UJ, 

i=l i=l j=l 

all have the same distribution. Here Ui,U2, ■ ■ ■ ,Un+i are i.i.d. exponential 
random variables with mean 1, Exp{l). Since Exp{l) can also be viewed as 
the Gamma random variable Gamma{l, 1), J2i=i Ui/J2^=i easily 
shown to follow Beta{rn, n + 1 — rn)- □ 

To finalize constructing the proposed tolerance region in (3.10), we need 
to identify a suitable r„ which can satisfy Definitions 3.1 or 3.2. Following 
Theorem 3.1, this is equivalent to finding r„ to meet the following criteria, 

(3.11) P(Seta(r„,n + l-r„)>/3)=7 
or 

(3.12) E{Beta[rn,n + l-rn)) = l3. 
For (3.12), r-n can be easily solved as 

Tn = (n+ 1)/?, 

since E{Beta{a,b)) = For (3.11), it is not easy to find an analytical 
solution. Alternatively, we can obtain an approximation of the solution using 
the asymptotic result stated in Theorem 3.2 below. 
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Remark 3.1 (Multivariate spacings as statistically equivalent blocks). 
For a multivariate sample of size n, Tukey (in [24]) considered a partition of 
the sample space into n + 1 disjoint blocks as statistically equivalent blocks 
if the followings are satisfied: 

(a) the coverages of the (n + 1) blocks add up to 1; 

(b) the joint distribution of the coverages of the (n + 1) blocks are com- 
pletely symmetrical; 

(c) if the coverages of the (n + 1) blocks are taken as barycentric coordi- 
nates on an n-simplex, the distribution over the simplex is uniform; 

(d) the sum of the coverages of any k preselected blocks of the (n + 1) 
follows Beta{k,n — k + 1). 

From the proof of Theorem 3.1, we can see that our multivariate spacings 
{MSi, . . . , MSn+i} satisfy the above conditions and can be viewed as statis- 
tically equivalent blocks. Note that the blocks as in our multivariate spacings 
are automatically determined by the given data. In contrast, the statistically 
equivalent blocks considered in [24] and other follow-ups all need to decide 
on the shape of the blocks before forming the blocks. Therefore, we view 
our multivariate spacings as an ideal data driven realization of statistically 
equivalent blocks. Moreover, the inherited property of data depth allows our 
statistical equivalent blocks to follow more closely the data structure and 
also be completely nonparametric. 

Theorem 3.2. As n— >oo, if Vn satisfies 



where is the j-quantile of the standard normal distribution [i.e., ^{C-y) = 
7], then 



Proof. Recall that Yi = PpiX : Df{X) > Zj), z = l,...,n, with the 
order statistics Y[i] < ••• < Y^^]- Let tOn = i^{i-Yi < P}- Then we obtain 
P(Pp(0^[r„i) > /3) = P{uJn < rn). Furthermore, since y^'s can be viewed as 
an i.i.d. sample from [/[0,1], uJn follows the binomial distribution with pa- 
rameter {n,(3). Therefore, as n — > 00, we have 




P(Pi.(0^[.„,)>/3)^7- 



P{Pf{0 zlrr^l )>/?)= P{u^n < Tn) ^ ^ 



( 



) 



= 7. 



□ 
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3.2. Properties of tolerance regions: F is unknown. If F is unknown, the 
tolerance region is then constructed from the sample spacings in a similar 
fashion as in (3.10). More specifically, recall that Zj = Dp^(Xi), i = 1, . . . ,n, 
and that z'^l > • • • > Z'"! are the descending estimated depth values corre- 
sponding to the depth order statistics , . . . , . The tolerance region 
is then the union of a suitable number of the inner sample spacings. More 
precisely, the proposed tolerance region can be expressed as 

(3.13) 0|,.„j = [jMS,^ {X:DfJX) > Z^^"^}, 

i=l 

where MSi is the ith sample spacing, as defined in (2.4). 

To establish the asymptotic properties for which are analogous to 

those for O^im] , we require the followings on the data depth Dp^{-) used in 
the derivation of the spacings. 

(i) If F is absolutely continuous, then Dp^{x) is uniformly consistent 
almost surely, that is, as n ^ oo, 

(3.14) dn = sup\DF^Xx)-DF{x)\^0 a.s. 

X 

(ii) If F is an elliptic distribution with the location-scatter parameter 
(/U,S) (i.e., its density assumes the form f{x) = \Y,\~^/'^g{{x — ^)'S~^(2; — 
fi))), then its elliptic contour can be expressed as e{x) = (x — //)'S~^(2; — 
fi). In this case, the level sets (or contours) of Dpix) are also in the form 
of {x:e(x) = c} for some e{x). Furthermore, Dp[x) is a strictly monotone 
function of c, which implies that for any c > 0, 

(3.15) P{X:Df{X) = c) = ^. 

The discussion of (i) and (ii) under the simplicial depth can be found in 
[16]. Further discussions of depth contours can be found in [13, 17] and [31]. 
Under the assumptions (i) and (ii), we now present the main results of the 
section. 

Theorem 3.3. Assume that conditions (i) and (ii) hold for the depth 
Df„{-) used in deriving the spacings. For any e > 0, if the sequences r[n;i]> 
r[n;2] and r[„.3] (1 < r[n-j] <n,j = 1, 2, 3) satisfy, as n^oo, 

- (/5 + £)) - e7\//3(i-/3), 
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then 



n 



limP(PHO>^„,,, )>/?)> 7, 
limP(PHO|,,„,„ )>/?)< 7 



and 



limi?(PHO|,,„,„))=/?. 



The proof of Theorem 3.3 is somewhat involved and is given in the 
Appendix. 

Remark 3.2. Since e in Theorem 3.3 can be arbitrarily small, to obtain 
r-a satisfying P(Pp(0^j^^j) > /3) = 7, we may in practice simply take e = 
and calculate r„ by solving 



If Tn is not an integer, we use [r„J or [r„] , depending on which of the 
following is closer to 7, 



P{Beta{[rn\,n + l-[rn\)>l3) and P{Beta{\rn\,n + l- \rn\)> 13). 



3.3. Asymptotic minimum property. So far, we have justified the pro- 
posed tolerance regions according to Definitions 3.1 and 3.2. Next, we will 
show that under a certain class of distributions (including elliptical distri- 
butions), the proposed tolerance regions are asymptotically minimal. This 
property is clearly desirable. 

Asymptotically minimal tolerance regions were first considered in [3] by 
Chatterjee and Patra. Assume that the sample Xi,X2,...,X„ is drawn 
from F{-) S which has a density function /(•). Let A(-) denote the p- 
dimensional Lebesgue measure. Consider the set: 



Assume that all levels set of / have Lebesgue measures zero, namely A{x : f{x) = 
u} = for any v. Chatterjee and Patra considered the following (3-content 
tolerance region formed by density level sets: 



where C/,i-/3 is the (1 — /3)-quantile of the random variable f{X). In other 
words, C/,i-/3 is ^ solution of Gf{v) = 1 — (3. It can be shown that 



and that, among all subsets whose probability content with respect to F is 
at least (3, the subset Rf^f^ is minimal in the sense of having the smallest 
Lebesgue measure. 




Gf{v) = PF{f{X)<v). 



(3.16) 



Rf,f3 = {x:f{x) > C/,l-/3}' 



PF{Rf,f3) = /3, 
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Definition 3.3. A sequence of (3-content tolerance regions {Sn} is 
called asymptotically minimal if 

A(S'„Ai?j^^) asn— >cx). 

Here {AA.B) indicates the symmetric difference between sets A and B. 

In the finite sample case, [3] replaced / with a density estimate to obtain 
a sample tolerance region, and showed the asymptotic minimum property. 
Clearly, the quality of the obtained tolerance region depends on the density 
estimation approach used. 

Under the approach with data depth -D(-), we consider 

Gd{v) = Pf{Df{X)<v). 

Denote by r]i_p the (1 — /3)-quantile of the random variable Df{X). In other 
words, 

GD{rii-p) = Pf{Df{X) < r?i„^) = 1-/3. 

Clearly, 

(3.17) RD,p = {X:DF{X)>r^i^p}, 

is the true depth-based /3-content tolerance region. Definition 3.3 can then 
be modified for the approach using depth D{-) as: 

Definition 3.4. A sequence of /3-content tolerance regions {Sn} is 
called asymptotically minimal w.r.t. the depth function D{ ) if 

A(S'„Ai?£)^^) asn— >cx). 

In the next two theorems we show that our proposed tolerance regions 
0^[r„] and ^'^^ asymptotically minimal w.r.t. the chosen depth. 

Theorem 3.4. // condition (3.15) holds for the underlying depth D{-), 
O^lm] and 0^[r„] c-''"^ asymptotically minimal w.r.t. D{-). Specifically, for rn 
satisfying 

we have 

A(O^[.„jAi?z),/3)^0 and A(0|[,„, Ai?^,/?) ^ 0. 
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The proof of this theorem is given in the Appendix. 
Note that, for elhptical distributions, condition (3.15) holds for all the 
depth notions mentioned in Section 2.1, and thus 

Rd,p = {X : Df{X) > rji.p} = {x : /(x) > = Rf,^^. 

Consequently, Theorem 3.4 leads to the corollary below which implies that 
our proposed tolerance regions are asymptotically minimal under elliptical 
distributions. 

Corollary 3.1. For elliptical distributions, we have, under condition 
(3.15), asn^O, 

A(Oi[.„,Ai?/,/3)^0. 

4. Simulation and comparison studies. In this section, we present some 
simulation studies to illustrate the performance of our tolerance regions 
0^[r„] and . The simulation procedure is outlined in the following 

steps. Assume that F is absolutely continuous. 

Step 1. Generate a random sample {Xi,X2, ■ . ■ ,Xn} from F. Calculate the 
depths of Xj's with respect to the data cloud and identify the r„th 
largest depth (i.e. Z'^"]) as the threshold for forming our tolerance 
region, where 

^ _ f + V^^-yV Pi^ ~ if (3-1) is required, 

" 1 (?T- + l)/3, if (3.2) is required. 

Step 2. Generate another random sample, {Xj*, X|, . . . , X*}, from F. Calcu- 
late the depth of X!^ with respect to the original samples, {Xi,X2, ■ ■ ■ , 
Xn}, and obtain the proportion of X*'s which assume depth value 
greater than the threshold obtained in Step 1. This proportion is 
denoted as /3. 

Step 3. Repeat Step 2 m times and use the average of the m /?'s obtained in 
this manner as an approximate of -Pf(0^j^^]). Denote this average 
by /?. If a P- content tolerance region in (3.1) is sought, we check 
whether or not f3 > (3. Let -^{^>/3} be 1 if the event {(3 > (3} occurs, 
and otherwise. If a f3- expectation tolerance region in (3.2) is sought, 
we simply record the /3. 

Step 4. Repeat Steps 1-3 sufficiently many, say M, times. For a (3-content 
tolerance region, we estimate the confidence level 7 by 7 = 
X^i^i -f{/3>/3}/M. For a (3- expectation tolerance region, we estimate 

(3 simply by the average of the M /3's, namely (3 = J^fli A/M. 
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Throughout our simulation study, we set (3 = 90%, 7 = 95%, m = 100, 
M = 1000 and n = 300 and 1000. The simphcial depth is used to calculate 
all depth values unless specified otherwise. The results are presented in Ta- 
ble 1. From Table 1, we can see that all the estimates are fairly close to 
the nominal levels. We also present in Table 1, within brackets, the simula- 
tion results using the approach given in [6] by Di Bucchianico, Einmahl and 
Mushkudiani. The coverage from this approach is consistently lower than 
the nominal value, and also, generally speaking, the difference between the 
achieved and nominal coverage is larger than that of ours. Thus, our pro- 
posed tolerance region is better in terms of achieving the desired tolerance 
level. Moreover, as observed in [6], if the dimension of the underlying dis- 
tribution increases, the approach there needs an adjustment to refiect such 
a change in the target nominal value to prevent the achieved coverage from 
falling too much below. The adjustment suggested in [6] seems somewhat 
ad hoc. Adding all these observations to the fact that our approach does 
not require additional assumptions (e.g., shape of the tolerance region), our 
approach clearly yields more favorable nonparametric multivariate tolerance 
regions. 

To help visualize the outcome of our constructions, we present further 
in Figure 3 our proposed tolerance region for bivariate normal and expo- 
nential distributions. Under each distribution, the sample size is 500 and 
the tolerance regions shown are aiming for nominal values (3 = 90% and 
7 = 95%. Note that since there is no explicit formula for the simplicial 
depth, there is no explicit expression for the proposed tolerance region 
^it"-"] ~ {-^ '■ -^Fni^) > ^t*""'}. (Here r„, is to be determined according to 
Remark 3.2.) In practice, we can simply present the convex hull spanning 
all the sample points which achieve higher depth value than Z^^"^ as the es- 
timated tolerance region. The algorithm provided in [1] can help determine 
such convex hulls. 

As seen from the plots, our tolerance regions can capture the underlying 
geometric shapes of the data. For the bivariate normal distribution, the 



Table 1 

The achieved confidence levels of the proposed j3-content and 13 -expectation tolerance 
regions when 7 = 95% and {3 = 90% 



F 




7 




/3 


n = 300 


n = 1000 


n = 300 


n = 1000 


Bivariate Normal 


0.954 


0.949 


0.90131 [0.877] 


0.90005 [0.887] 


Bivariate Cauchy 


0.963 


0.961 


0.90036 [0.862] 


0.90061 [0.863] 


Bivariate Exponential 


0.941 


0.943 


0.90043 [0.885] 


0.89985 [0.890] 



Results in [] are obtained using approach in [6]. 
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Fig. 3. Tolerance region for: (a) a hivariate normal sample; (b) a bivariate exponential 
sample. 



region has the elhptical shape. For the bivariate exponential distribution, 
the region has a triangular shape fanning upper-right. Overall, our tolerance 
region focuses more on the central part of the data and also follows the 
expansion of the probability mass. For example, for the bivariate exponential 
distribution, our tolerance region does not include the observations near the 
origin, since their positions are relatively outlying with respect to the center 




Fig. 4. 



Tolerance regions for a bivariate normal sample: unknown F vs. known F. 
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of the distribution. In contrast, the tolerance region obtained by using the 
method proposed in [6] must include those observations since they have high 
density. However, these observations may not be acceptable in practice, since 
they may be considered extreme according to the underlying distribution. 
This point can be made more pronounced by incurring a small perturbation 
to create a thinning gap between the points near the origin and the rest 
of the data. Therefore, the design of our tolerance region in representing 
a central region of the data is naturally built to prevent the region from 
including observations which are likely to be extreme. 

As discussed in Section 3.1, when the underlying distribution F is known, 
we can construct the tolerance region which satisfies exactly the preset re- 
quirement in (3.1) or (3.2). When F is unknown, we propose a method 
for constructing the tolerance region based on the sample only and develop 
their asymptotic properties. From the asymptotic point of view, the pro- 
posed method also satisfies the preset requirement. To assess the perfor- 
mance of this proposed tolerance region in the setting of a finite sample 
size, we conduct another simulation study. In the same bivariate normal 
setting as above, we use the Mahalanobis depth to construct the tolerance 
regions separately under F is known and F is unknown. One advantage in 
using the Mahalanobis depth is that it has a closed form for both population 
and sample versions, and thus we can obtain the exact proposed tolerance 
regions. Figure 4 shows the two tolerance regions. The dashed ellipse is the 
one when F is known, which is the true tolerance region. The solid ellipse is 
the proposed tolerance region when F is unknown. The two regions almost 
coincide with each other, which clearly implies that the finite sample perfor- 
mance of the proposed tolerance region is quite satisfactory. The convex hull 
in Figure 4 is formed by only the sample points which have higher depth 
value than Zt*""]. (This generally reduces tremendously the computational 
effort, and is more practical, especially when the depth itself does not have a 
simple closed form.) It is not surprising that the convex hull is located inside 
the solid ellipse. Note that difference between these two regions is not sig- 
nificant. Therefore, although the convex hull formed by those central points 
is not the exact proposed tolerance region, it is presented here to show that 
it can be a viable alternative that provides a practical solution. To use the 
tolerance region in terms of certifying specifications, we determine whether a 
new observation is in the tolerance region by first calculating its depth with 
respect to the given sample and then simply comparing the depth value to 
the threshold Z^^"!. This is a relatively straightforward task in practice. 

5. Concluding remarks and future research. In this paper, we intro- 
duced multivariate spacings based on the ordering derived from data depth. 
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They satisfy all properties one would expect of a notion of spacings. More- 
over, they are nonparametric and they reflect well the geometry of the un- 
derlying distribution. We show how to use these spacings to construct toler- 
ance regions for multivariate distributions. The construction of our tolerance 
regions can be viewed as a multi-dimensional generalization of the Wilks' 
method. 

Given that our spacings are derived from data depth, the resulting tol- 
erance regions are always connected and naturally located in the "central" 
region of the data set. This is an important property in applications: in prac- 
tice, specifications of products are not given in disconnected patches and a 
single production line is generally designed to produce continuous measure- 
ment around a target value. The connected tolerance region ensures that 
the capability of production processes can be achieved. This point was also 
discussed and illustrated with Figure 3 in Section 4. 

One important direction for the applications of univariate spacings is non- 
parametric inference. It includes many existing rank tests and goodness-of-fit 
tests. A survey of these tests as well as relevant references can be found in 
[22]. In forthcoming papers, we shall explore our multivariate spacings in the 
development of nonparametric inference. We generalize many of the existing 
approaches on univariate spacings. We also study multivariate distributional 
characterizations using our multivariate spacings. 

APPENDIX 

Proof of Theorem 3.3. To prove Theorem 3.3, we need the following 
two lemmas. 

Lemma A. 1. For any r, |zM - zM| < (i„ = sup^ |L>f„(x) - 0, 
a.s. 

Proof. From the definition of we have 

\Z,-Z,\ = \DFSXi)-DF{Xi)\<dn, i = l,...,n. 
Then for any r, 

#{i : Z, < ZM - dn} < #{i : Z, < ZM} < r. 

Therefore, ZH - (f„ < Z^''l Similarly, we can show that Zl*^! - d„ < ZM. The 
claim of the lemma thus follows. □ 

Lemma A. 2. Suppose that a„ and hn are two sequences of random vari- 
ables such that for some random variable a taking values on [0,oo], and 
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a„ ^ a and a on a positive measure subset of the sample space, say S, 
as n— > oo. Then under the assumptions (i)-(ii), 

PF{0'a„AObJ a.s. on the set S, 

where 02 = {x : Df^{x) > an} , Ob = {x : Df{x) > bn} , and AAB = {Au 

B)\iAnB). 

Lemma A. 2 with proof is given in [13]. 

We now proceed with the proof of Theorem 3.3. Assume that 
as n — > cxD, then the consistency property of a sample quantile shows that 
Z^^"^ r]q, a.s., where rjq is the upper gth quantile oi Z = Df{X). Clearly, 
Lemma A.l immediately implies Z'''"! — > r]q, a.s. Let a„ = Z^"^"^ and bn = 
Z^^"h Following Lemma A. 2, we have 

(A.l) Pf(O|[,„,AO^[.„,)^0 a.s. 

Denote An = ^'f(0|[,^,), 5„ = Pi.(0|(^^, AO^i.,.] ) and C„ = Pf(0^[.„i). 
Then 

P{Cn > P) 

< P{An + Bn>P) 

= P{An + Bn>pnBn<e)+ P{An + S„ > /3 H B„ > e) Ve > 
<P{An>P-e) + P{Bn>e). 
From (A.l), we have, as n — > 00, 

P{Bn > e) ^ 

and 

lim P(Cn >I3)< lim P(A„ > (5 - e) Ve > 0. 

n— »oo n— >oo 

Therefore, 

lim P(An >I3)> lim P(Cn > P + e) Ve > 0. 

n— >oo n— >oo 

If r„ satisfies 

V^(— - {/3 + e)) ^ C^J P{1 - (3) and lim P(C„ > /? + e) ^ 7, 

\ fl J V n — ^00 

then 

lim P{An > /3) > 7- 

n^oo 

Similarly, we have 

P{An >P)< P{Cn + Bn>(3)< P{Cn >/?"£)+ P{Bn > s), 
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and thus 

lim P(An >/3)< lim P(C7„ > /3 - e) Ve > 0. 

n— »oo n^oo 

If r„ satisfies 

-{(3- e)) ^ e7\//3(l - /3) and lim P(C„ > /3 - e) ^ 7, 

then hm„^oo -P(^n > ^) < 7- 

Regarding the /3-expectation tolerance region, we now have, following (A.l), 

-Bn — > a.s. asn— >oo. 

Since -B„ < 1, Bn is uniformly integrable. Thus lim„^oo = 0. Since 
E{An) < E{Cn + Bn) < E{Cn) + E{Bn), if r„ satisfies that ^^P, then 
lim„„»oo-E'(C„,) = P, and 

(A.2) limsup^(A„) </3. 

n— >oo 

Similarly, ^(C„) < + fi„) < + E{Bn). Then we have 

(A.3) /3<liminf£;M„). 

n — >oo 

Combining (A.2) and (A.3), we obtain lim^^oo -£'(^n) = P, and hence the 
proof of Theorem 3.3. □ 

Proof of Theorem 3.4. Since ^ ^ B, 

Gd{Z^''"^) = Pf{Df{X) < Zl"^"!) ^l-(5 

which implies 

(A.4) z[^"]^r?i_^. 
Moreover, the following 

= {X : Df{X) > Z^'-\Df{X) < r?i„^} 

U {X : Df{X) < Z^'"\Df{X) > m-^} 
C {X : 7]i_p - - 7?i_^| < Df{X) < 7?i_^ + - ??i-/3|} 
immediately implies that 

(A.5) < A{X :r?i_^ - - r/i_^| < D^(X) < r?i_^ + \Z^'--^ - 7?i_^|} 
= 5(|Z[^^"l-7n-;3|), 
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where 6{u) = X{X :r/i„^ — u < Df{X) < rji^p + u}. By assumption (3.15), 
5(0) = X{X : Df{X) = r]i^p} = 0. Also 6{u) is right continuous at because 
of the continuity oi Df(x). Therefore, (A.4) imphes that (5(|Z[''"1 — 77i_/3|) --^ 
0. FoUowing (A. 5), we finally obtain X{0^[rn]AR£i^i3) 0. 

The rest of the proof regarding 0^;^^] can be derived similarly. Following 
Lemma A.l and the definition of dn, we obtain 

= {X-.Df^X) > Z^'-KDf{X) < 7?i„^} 
U {X : Df„ {X) < Zl'-"] , Df{X) > r/i_^} 

C {X : Df{X) > Zl'-"] - 2d„, Df{X) < ^^p} 
U {X : Df{X) < + 2dn,DF{X) > 7]^^^} 

C{X:r?i_^-|Z['-"]-r?i_;3|-2d„ 

< DFiX) < rii^p + - r?i_^| + 2d„}. 

r 1 P p 

Finally, since Z'''"^ — > Vi-^ — > 0, we have 
A(0|„.„,Ai?^,^) 

<A{X:r/i_^-|Z['-"l-r/i_^|-2d„ 

< I)f(^) < + 1^'""' - ??1-/3| + 2dn} 

= 6{\Z^^-^-m_p\ + 2dn) 

This completes the proof. □ 
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